|
Good–Turing frequency estimation is a statistical technique for estimating the probability of encountering an object of a hitherto unseen species, given a set of past observations of objects from different species. (In drawing balls from an urn, the 'objects' would be balls and the 'species' would be the distinct colors of the balls (finite but unknown in number). After drawing red balls, black balls and green balls, we would ask what is the probability of drawing a red ball, a black ball, a green ball or one of a previously unseen color. ==Historical background== Good–Turing frequency estimation was developed by Alan Turing and his assistant I. J. Good as part of their efforts at Bletchley Park to crack German ciphers for the Enigma machine during World War II. Turing at first modeled the frequencies as a multinomial distribution, but found it inaccurate. Good developed smoothing algorithms to improve the estimator's accuracy. The discovery was recognized as significant when published by Good in 1953, but the calculations were difficult so it was not used as widely as it might have been.〔(Newsise: Scientists Explain and Improve Upon 'Enigmatic' Probability Formula ), a popular review of 〕 The method even gained some literary fame due to the Robert Harris novel ''Enigma''. In the 1990s, Geoffrey Sampson worked with William A. Gale of AT&T, to create and implement a simplified and easier-to-use variant of the Good–Turing method〔Sampson, Geoffrey and Gale, William A. (1995) (Good‐turing frequency estimation without tears )〕 described below. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Good–Turing frequency estimation」の詳細全文を読む スポンサード リンク
|